AITopics | training algorithm

Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

Neural Information Processing SystemsJun-20-2026, 09:11:35 GMT

Training data attribution (TDA) is concerned with understanding model behavior in terms of the training data. This paper draws attention to the common setting where one has access only to the final trained model, and not the training algorithm or intermediate information from training.

artificial intelligence, machine learning, similarity, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

Latent Chain-of-Thought for Visual Reasoning

Neural Information Processing SystemsJun-13-2026, 08:44:02 GMT

Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen reasoning tasks and heavily rely on a biased reward model. To address this challenge, we reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference. By leveraging diversity-seeking reinforcement learning algorithms, we introduce a novel sparse reward function for token-level learning signals that encourage diverse, high-likelihood latent CoT, overcoming deterministic sampling limitations and avoiding reward hacking. Additionally, we implement a Bayesian inference-scaling strategy that replaces costly Best-of-N and Beam Search with a marginal likelihood to efficiently rank optimal rationales and answers. We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on four reasoning benchmarks, in terms of effectiveness, generalization, and interpretability.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Training Transformers with 4-bit Integers

Neural Information Processing SystemsApr-29-2026, 03:27:06 GMT

Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

48cb136b65a69e8c2aa22913a0d91b2f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 17:45:41 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

UnfoldML_Nuerips

Yanbo Xu

Neural Information Processing SystemsApr-24-2026, 23:49:22 GMT

Algorithm 1 Hard-gating Algorithm for In-Stage IDKCascade Input Ds: Training data containing Ns samples in stage-s Ms: Sorted list of the models trained for stage-s C: Dictionary of models' spatio-temporal costs cs: User-defined budget of spatio-temporal cost for stage-s q: Confidence function maxA: Value for the upper bound of the cutoffs to avoid over-fitting nBins: Number of bins for the grid search Output s: The optimal IDK cutoff vector for stage-s 1: procedure HARDGATING(Ds, Ms, cs, C, q, maxA, nBins) 2: s =[], ModelAssign = 1, cost = P We use the Sepsis-3 toolkit3 to obtain the suspected infection time in patients, and following the process in Seymour et al. (2016) to finally label the onset of sepsis. We result at a total number of 20,009 sepsis patients out of the 52,902 adult patients from MIMIC-III database. We exclude those patients who stay in ICUs less than 6 hours and also exclude those patients who developed sepsis within the first 6 hours after ICU admission. This reduces our cohort to a total of 34,475ICU patient, and only 2,370(6.8%) Then according to Singer et al. (2016), we identify the onset of septic shock as Algorithm 3 End-to-End Training algorithm for UnfoldML Input D: Full training data containing N instances M: Full model zoo C: Dictionary of models' spatio-temporal costs q: Confidence criterion Output: the optimal ICK1 gate parameters (or a,b): the optimal IDK gate parameters 1: procedure END-TO-ENDTRAINING (D, M) 2: Pre-allocate costs cs for each stage s. Figure 4: Transitions in model calls: both cascades always call the first model per each stage for an entrance and transition to next models (IDK) or next stage (ICK).

artificial intelligence, machine learning, shock patient, (16 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Uniform Sampling over Episode Difficulty

Neural Information Processing SystemsApr-24-2026, 15:32:59 GMT

Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform an extensive analysis and find that sampling uniformly over episode difficulty outperforms other sampling schemes, including curriculum and easy-/hard-mining. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies across many episodic training algorithms. We demonstrate the efficacy of our method across popular few-shot learning datasets, algorithms, network architectures, and protocols.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.46)
North America > United States > New York (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Media > Television (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

IM-Loss: Information Maximization Loss for Spiking Neural Networks

Neural Information Processing SystemsApr-24-2026, 07:36:41 GMT

The conditional entropy H(O|U) can be expressed as the below equation according to the Eq.5 and Eq.7. I(U;O) = H(O) (10) A.2 Algorithm The proposed training algorithm of an SNN is presented in Algo.1. Algorithm 1 The proposed training algorithm of an SNN. Input: Initialized SNN; training dataset; total training epochs, I; training iterations per epoch, J. Output: The trained SNN. W, where η is learning rate.

artificial intelligence, information maximization loss, machine learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Quantum Perceptron Models

Ashish Kapoor, Nathan Wiebe, Krysta Svore

Neural Information Processing SystemsApr-22-2026, 07:04:46 GMT

We demonstrate how quantum computation can provide non-trivial improvements in the computational and statistical complexity of the perceptron model. We develop two quantum algorithms for perceptron learning. The first algorithm exploits quantum information processing to determine a separating hyperplane using a number of steps sublinear in the number of data points N, namely O( N). The second algorithm illustrates how the classical mistake bound of O( 1γ2) can be further improved to O( 1 γ) through quantum means, where γ denotes the margin. Such improvements are achieved through the application of quantum amplitude amplification to the version space interpretation of the perceptron model.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: